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Abstract 

We consider the problem of detecting a sparse Poisson mixture. Our results parallel those 
for the detection of a sparse normal mixture, pioneered by Ingster (1997) and Donoho and Jin 
(2004), when the Poisson means are larger than logarithmic in the sample size. In particular, 
a form of higher criticism achieves the detection boundary in the whole sparse regime. When 
the Poisson means are smaller than logarithmic in the sample size, a different regime arises in 
which simple multiple testing with Bonferroni correction is enough in the sparse regime. We 
present some numerical experiments that confirm our theoretical findings. 

Keywords: Sparse Poisson means model, goodness-of-fit tests, multiple testing, Bonferroni’s 
method, Fisher’s method, Pearson’s chi-squared test, Tukey’s higher criticism, sparse normal 
means model. 


1 Introduction 

The Poisson distribution is well suited to model count data in a broad variety of scientific and 
engineering fields. In this paper, we consider a stylized detection problem where we observe n 
independent Poisson counts Xi ,..., from a mixture 

Aj ~ (1 - e) Pois(Ai) -b I Pois(A') -b | Pois(A''), (1) 

where 

A' = Aj -b Aj, A'/ = max(0, Aj — Aj), for some Aj > 0, (2) 

and e G [0,1] is the fraction of the non-null effects. All the parameters are allowed to change with 
n. We are interested in detecting whether there are any non-null effects in the sample. Specifically, 
we know the null means , Ai,..., A„, and our goal is to test 

Hq : e = 0 versus Hi : e > 0. (3) 

Put differently, we want to address the following multiple hypotheses problem 

Ho^i : Xi ~ Pois(Ai) versus Hi^i : Aj ~ (1 - e)Pois(Ai) -b |Pois(A') -b |Pois(A'/). 

We do assume that e is the same for all i, although this is done for ease of exposition. 

This model may arise in goodness-of-fit testing for homogeneity in a Poisson process. Suppose 
we record the arrival time of alpha particles over a time period and we are interested in testing for 
uniformity. One way to do so is to partition the time period into non-overlapping intervals, and 
count how many particles arrived with each interval. These counts can be modeled by a Poisson 
distribution. For this problem, and any other discrete goodness-of-fit testing problems, one would 
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typically use Pearson’s chi-squared test, but we show that, under some mild conditions, this test is 
(grossly) suboptimal in the sparse regime where e = £n = o{l/y/n). 

In another situation, we might be interested in detecting genes that are differentially expressed. 
Marioni et al. (2008) find that the variation of count data across technical replicates can be captured 
using a Poisson model when the over- (or under-) dispersion is not significant. Suppose we know 
the Poisson mean count for each gene expressed under normal conditions and want to detect a 
difference in expression under some other (treatment) condition. 

In the model we consider here (1) the sparsity assumption is on the number of nonzero effects, 
which on average is ne. We assume that e —)• 0, so the number of nonzero effects is negligible 
compared to the number n of bins or genes being tested. And so there are some nonzero effects 
under the alternative, we assume throughout the paper that 

ne —>■ oo. (4) 

We note that sparsity here has a different meaning from the use in the literature on sparse multi¬ 
nomials (Holst, 1972; Morris, 1975). We note that sparsity here has a different meaning from the 
use in the literature on sparse multinomials Holst (1972); Morris (1975), where the number of the 
bins is large so that some bins have small expected counts. 

The Poisson sparse mixture model we consider here is analogous to the normal sparse mixture 
model pioneered by Ingster (1997) and Donoho and Jin (2004), where the normal location family 
AA(A, A) plays the role of the Poisson family Pois(A). (We note that in the normal model, one can 
work with AA(^, 1), ^ = \/A, without loss of generality, while such a reduction does not apply to the 
Poisson model.) Our results for the Poisson model are completely parallel to those for the normal 
model when the Poisson means are large enough that the normalized counts 

Z, := (W-Ai)/v^ (5) 

are uniformly well-approximated by the standard normal distribution under the null. Specifically, 
we show that this is the case when 

min Aj S> log n. (6) 

i 

(For two sequences (an), (bn) C M+, a„ 3> bn means that an/bn —> oo.) In particular, we show 
that multiple testing via the higher criticism, which Donoho and Jin (2004) developed based on an 
idea of J. Tukey, is asymptotically optimal to first order, just as in the normal model. To show 
this, we use care in approximating the tails of the Poisson distribution with the tails of the normal 
distribution. This is done by standard moderate deviations bounds. 

When the Poisson means are smaller, by which we mean 

maxAj <C logn, (7) 

i 

we uncover a different regime where multiple testing via Bonferroni correction is optimal in the 
sparse regime. In this regime, the normal approximation to the Poisson distribution is not uniformly 
valid, and in fact not valid at all for those indices i for which Aj remains fixed. We use large 
deviations bounds to control the tails of the Poisson distribution. 

In any case, we assume that the expected counts are lower bounded by a positive constant, 
concretely 

Aj > 1, Vf = 1,..., n. (8) 

This is to make the paper self-contained, and also because in practice it is common to pool together 
bins to make the expected counts larger than some pre-specified minimum. 
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The remainder of the paper is organized as follows. In Section 2, we derive information lower 
bounds under various conditions on the Poisson means. In Section 3, we study the Pearson’s chi- 
squared goodness-of-fit test and also the max test, which is closely related to multiple testing with 
Bonferroni correction, showing that none of them is optimal in all sparsity regimes. We then study 
the higher criticism and show that it is optimal in all sparsity regimes, matching the information 
bound to first-order. In Section 4, we show the result of some numerical simulations to accompany 
our theoretical findings. Section 7 is a discussion section. The proofs are gathered in Section 5. 
We then briefly touch on the one-sided setting in Section 6. 


2 Information Bounds 

We are particularly interested in regimes where the proportion of non-null effects tends to zero as 
the sample size grows to infinity, i.e. e —)• 0 as n —)• oo. We follow the literature on the normal 
sparse mixture model (Cai et ah, 2011; Donoho and Jin, 2004; Ingster, 1997). We parameterize 

e = n~^, where (3 G (0,1) is fixed (9) 

and consider two regimes where the detection problem behaves quite differently: the sparse regime 
where /? G (1/2,1) and the dense regime where 13 G (0,1/2). We then parameterize the Poisson 
means in (1) differently in each regime. When the Aj’s are relatively large, we are guided by the 
correspondence between the normal model and the Poisson model via the normalized counts (5). 

Suppose we know the fraction e and all null and non-null Poisson rates. By the Neyman-Pearson 
fundamental lemma, the most powerful test for this simple versus simple hypothesis testing problem 
is the likelihood ratio test (LRT). Hence the performance of the LRT gives an information bound 
for this detection problem. We investigate this information bound by finding the conditions such 
that the risk (the sum of probabilities of type I and type II errors) of LRT goes to one as n ^ oo. 
We say a test is asymptotically powerful when its risk tends to zero and asymptotically powerless 
when its risk tends to one. All the limits are with respect to n —>• oo. 


2.1 Dense Regime 

Guided by the correspondence with the normal model, in the dense regime where f3 < 1/2, we 
parameterize the effects as follows 

Ai = n* • v^, (10) 

where s G M is fixed. Define 

Pdense(/3) ~ '2 ~ 

Proposition 1. Consider the testing problem (3) with parameterizations (9) with 13 < 1/2 and 
(10). All tests are asymptotically powerless if 


S < Pdense{/3). 


( 12 ) 


The expert will recognize the perfect correspondence with the detection boundary for the dense 
regime in the two-sided detection problem in the normal model. 
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2.2 Sparse Regime 


Guided by the correspondence with the normal model, in the sparse regime where (3 > 1/2, we 

start by parameterizing the effects as 

follows 



Aj = v'2rlogn • \i, 

(13) 

where r G (0,1) is fixed. Define 



Psparse(/5) — 

f/3-1/2, l/2</3<3/4, 

|(1_ 0:^)2, 3/4</3<1. 

(14) 


Proposition 2. Consider the testing problem (3) with parameterizations (9) with (3 > 1/2 and 
(13) with (6). All tests are asymptotically powerless if 


^ ^ Psparse(/3)' (1^) 

Thus, Propositions 1 and 2 together show that, when (6) holds, meaning that min* A* ;§> logn, 
the detection boundary for the Poisson model is in perfect correspondence with the detection 
boundary for the normal model. 

When the null means (A* : i = 1,..., n) are smaller, a different detection boundary emerges in 
the sparse regime. To better describe the detection boundary that follows, we adopt the following 
parameterization 

A'= Aj~"’'(logre)'^, A'/= 0, where 7 > 0 is fixed. (16) 

Indeed, this particular case corresponds to Aj = A|~"^(logn)'>', and assuming the Aj’s are smaller 
than logn as we do, this implies that A'' = 0, as it cannot be negative. 

Proposition 3. Consider the testing problem (3) with parameterizations (9) with /3 > 1/2 and 
(16) with (7) and (8). All tests are asymptotically powerless if ^ < (3. 


3 Tests 

In this section we analyze some tests that are shown to achieve parts of the detection boundary. We 
find that the chi-squared test achieves the detection boundary in the dense regime, the test based 
on the maximum normalized count (which is closely related to multiple testing with Bonferroni 
correction) achieves the detection boundary in the very sparse regime, while multiple testing with 
the higher criticism achieves the detection boundary in all regimes. 

3.1 The chi-squared test 

We start by analyzing Pearson’s chi-squared test, which rejects for large values of 

^ ^ {Xj - Xjf 

hi 

The rationale behind using this test is two-fold. On the one hand, D = Z/ — where the Zj’s are 
defined in (5) — is the analog of the chi-squared test that plays a role in detecting a normal mean in 
the dense regime. On the other hand, this is one of the most popular approaches for goodness-of-fit 
testing if one interprets Xi, ..., as the counts in a sample of size N ~ Pois(^^ Aj) with values 
in {l,...,n}. 

Although we could state a more general result, we opt for simplicity and state a performance 
bound when the expected counts are not too small. 
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Proposition 4. Consider the testing problem (3) with (8), and let ai = A?/A*. Then chi-squared 
test is asymptotically powerful if 

£''^aii>^/n and (18) 

i i i 

and asymptotically powerless if 

e''^^ai<^y/n and = o{n) and = o{n^). (19) 

ill 


From this, we immediately obtain the following result, which at once states that the chi-squared 
test achieves the detection boundary in the dense regime, and does not achieve the detection 
boundary in the sparse regime. 

Corollary 1. Consider the testing problem (3) with the lower bound (8). In the dense regime, 
where /3 < 1/2 in (9) and under the parameterization (10), the chi-squared test is asymptotically 
powerful when s > pdenseifd) defined in (11). In the sparse regime, where (3 > ll2 in (9) and under 
the parameterization (13), the chi-squared test is asymptotically powerless when r is constant. 

Other classical goodness-of-tests include the (generalized) likelihood ratio test and the 
Freeman-Tukey test. Adapted to our context, the likelihood ratio test rejects for large val¬ 
ues of 

while the Freeman-Tukey test rejects large values of 

n 

i=l 

We did not investigate these tests in detail, but partial work suggests that they are (as expected) 
equivalent to the chi-squared in the regimes we are most interested in. 

3.2 The max test 

In analogy with the normal model, we consider the max test which rejects large values of 

M = max \Zi\, (22) 

2=l,...,n 

where the Z/s are defined in (5). 

Proposition 5. Consider the testing problem (3), parameterized by (9) and (13) with (6). When 
r > (1 — ffig yjidx test is asymptotically powerful. 

Hence, the max test achieves the detection boundary (14) in the very sparse regime where 
/3 G (3/4,1). We speculate that, just as in the normal model, the max test does not achieve the 
detection boundary when /? < 3/4. 


( 20 ) 


( 21 ) 
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3.3 The higher criticism test 


In the normal model, Donoho and Jin (2004) advocate a test based on the normalized empirical 
process of the ZiS. In our case, these variables are not identically distributed. It would make sense 
to convert these to P-values, then, and we will comment on that in Section 3.4. For now, we opt 
for the following definition 


T* = sup T{z), T{z) 


Y.i {'^{\Zi\>z} - Kxjjz)) 


where 


(23) 


Kx{z) := P (|Ta - A|/^/A > z), = {^ G N : Ei^A,(^)(l - Kx^iz)) > logn}. 


We consider the higher criticism test rejects for large values of T*. This definition extends the 
higher criticism of Donoho and Jin (2004), in particular the variant HC+, to the case where the 
test statistics are not identically distributed under the null — and cannot be transformed to be so. 
The discretization of the supremum makes the control under the null particularly simple. 


Proposition 6. Consider the testing problem (3), parameterized by (9) and (13) with (6). When 
r > /0sparse(/3), the higher criticism test is asymptotically powerful. 

We speculate that, just as in the normal model, the higher criticism is also able to achieve the 
detection boundary in the dense regime. 


3.4 Multiple testing: Fisher, Bonferroni and Tukey 

We now take a multiple testing perspective. In multiple testing jargon, our null hypothesis Hq is 
the complete null, since 

n 

i=l 

Several possible dehnitions for P-values are possible here. We define the P-value for the ith. hy¬ 
pothesis testing problem as follows 

Pi = GxXXi), where Ga(x) := P(|Ta - A| > |x - A|). (24) 

There does not seem to be a consensus on the dehnition of P-value for asymmetric discrete null 
distributions (Dunne et ah, 1996). We speculate that any reasonable dehnition leads to the same 
asymptotic results in our context. We note that the pfs are independent, but they are discrete, 
and therefore not uniformly distributed in (0,1) under the complete null. In fact, they are not even 
identically distributed unless the Aj’s are all equal. That said, for each i, the null distribution of pi 
stochastically dominates the uniform distribution. 

Lemma 1. (Lehmann and Romano, 2005, Lem 3.3.1) For any A > 0, 

P(Ga(Ta) < u) < u. Vug (0,1). 

With P-values now dehned, we can draw from the literature on multiple comparisons and make 
correspondences with the tests that we studied in the previous sections. 
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Fisher’s method 


The chi-squared test is, in our context, intimately related to multiple testing with Fisher’s method, 
which rejects the complete null for large values of 

n 

-2^1ogpi. (25) 

i=l 

We speculate that, like Pearson’s chi-squared test, Fisher’s method achieves the detection boundary 
in the dense regime. We were able to prove it in the simpler one-sided setting. Details are postponed 
to Section 6. 

Bonferroni’s method 

The max test is, in turn, intimately related to multiple testing with Bonferroni’s method, which 
rejects the (complete) null for small values of 


min Pi. 

In fact, the two procedures are identical when the Aj’s are all equal. One can show that Proposition 5 
applies to the Bonferroni test also. Instead of formally proving this, we focus on complementing 
the lower bound established in Proposition 3. 

Proposition 7. Consider the testing problem (3) with parameterizations (9) with (3 > 1/2 and 
(16) with (7). When ") > j3, the Bonferroni test is asymptotically powerful. 

We note that the same is true if we merely focus on the large Z/s, meaning, if we replace the 
two-sided P-values pi with 

pr = Gf/f%X,), where Gr(x) := P(Ta > x). (26) 

In fact, one cannot exploit the assumption that X'f = 0 for all i. Indeed, if we consider the test that 
rejects for large values of T := ff{i ; W = 0}, it is asymptotically powerless. This follows from an 
application of Lemma 5. By a simple application of Lyapunov’s central limit theorem and (8), Y 
is asymptotically normal both under the null and the alternative. Moreover, 

Eo(y) = Varo(F) = ^e-^‘(l - e-^0 > (1 - 

i i 

where we used (8) and (7), while 

i 

and, after some simple calculations using (8), 

Varo(F) < Vari(F) < (1 - e/2)2 Varo(y) + ne/2 < Varo(F) + 

We can easily check that the conditions of Lemma 5 are satisfied when j3 > 1/2. 
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Tukey’s higher criticism 


This brings us back to the higher criticism, which is some sense is an intermediate method between 
Fisher’s and Bonferroni’s methods. Donoho and Jin (2004) attribute to Tukey the idea of testing 
the compiete nuii based on the maximum of the normaiized empiricai process of the P-vaiues, which 
equivaientiy ieads to rejecting for iarges vaiues of 

max -- - - --- -— , 

- P{i)) 

where p(i) < • • • < P(„) are the sorted P-vaiues. In our context where the P-vaiues are ciose to, but 
not exactiy uniformiy distributed, we can show that the test based on (27) achieves the detection 
boundary when aii the Aj’s are equai. (Detaiis are omitted.) When this is not so, we are not abie 
to conciude that this is stiii the case. 




4 Simulations 


We present the resuit of some numericai experiments whose purpose is to see the behavior of the 
various tests in finite sampies. So the asymptotic anaiysis is reievant, we chose to work with n = 10^ 
and n = 10®. In some bioinformatics/genetics applications, n could be in the millions. We compare 
the tests in terms of their power when the level is controlled at a = 0.05 by simulation. (We 
generate the test statistic 500 times under the null and take the (1 — a)-quantile as the critical 
vaiue.) The power against a particular alternative is then obtained empirically from 200 repeats. 

We note that, for the higher criticism, we work with the P-values defined in (24) and their 
corresponding null distribution Fi{t) := P(G;,.(T;,J < t), that is. 


HC = max 

teT 


T::=iihp.<t} - m)) 

vu=im)a-m)y 


(28) 


where T := {t £ (0,1) : 1/n < Fi{t) < 1/2, i = 1,... ,n}. We note that (28) is a generalized form 
of Tukey’s higher criticism (27) for the case where pi's are not identically distributed. Thus we 
find (28) more natural than (23), but the two are very closely related and the latter is more easily 
amenable to mathematical analysis. In practice, we estimate T) by simulation. 


4.1 In the dense regime 

In the dense regime, we have (9) with /3 G (0,1/2) and the parameterization (2) with (10). 

In the first set of experiments, we investigate how the test performance matches the theoretical 
information boundary (11). We set n = 10®, all the Aj’s equal to Aq = 15 > log(n) « 14, and 
vary (3 in the range of (0,0.5) with 0.025 increments and s in the range of [—0.5,0] with 0.025 
increments. When the Aj’s are all equal, Bonferroni’s method is equivalent to the max test, and 
is therefore omitted. The results are summarized in Figure 1. We see that the phase transition 
phenomenon is clear. We can see the performance of the chi-squared test and Fisher’s method are 
similar and comparable with the higher criticism, and achieve the asymptotic detection boundary. 
As expected, the max test has hardly any power in the dense regime. We note that very similar 
trends are observed in the normal means model. 

In the second set of experiments, we generate settings where the Aj’s are different. We take 
n = 10^ and fix /3 = 0.2, and the Aj’s are generated iid from Aq + Exp(Ao), where Exp(A) denotes 
the exponential distribution with mean A, and we let Aq G {1,10,100}. The results are summarized 







in Figure 2. We can see the chi-squared test and Fisher’s method perform similarly and are the 
best, closely followed by the higher criticism. The max test and the Bonferroni’s method perform 
similarly and poorly, as expected. The effect of Aq does not seem important. 


Chi-squared test 


Fisher's method 




P P 

Figure 1: Simulation results in the dense regime, with n = 10® and all Aj’s equal to Aq 
blue line is the information boundary (11). 


15. The 


^0 ~ 1 A-o = 10 A,q = 100 



s s s 

Figure 2; Simulation results in the dense regime, with n = 10^, /3 = 0.2, and the Aj’s generated iid 
from Aq + Exp(Ao). The vertical dotted line is the detection threshold. 
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4.2 In the sparse regime 

In the sparse regime, we have (9) with (5 G (1/2,1) and the parameterization (2) with (13). The 
experiments are otherwise parallel to those performed in the dense regime. 

In the first set of experiments, we set n = 10®, means all equal to Aq = 15, and vary P in the 
range [0.5,1] with increments of 0.025, and r in the range [0,1] with increments of 0.05. The results 
are summarized in Figure 3. While the chi-squared test is not competitive, as expected, we can see 
that the higher criticism has more power in the moderately sparse regime where f3 G (0.5,0.75), 
while the max test is clearly the best in the very sparse regime where j3 G (0.75,1). The asymptotic 
detection boundary is seen to be fairly accurate, although less so as /3 approaches 1, where the 
asymptotics take longer to come into effect. (For example, when n = 10® and (5 = 0.9, there are 
only ps 4 anomalies.) We note that very similar trends are observed in the normal means 

model. 

In the second set of experiments, we set n = lO'^ and /3 = 0.6 (moderately sparse) or /3 = 0.8 
(very sparse), and the Aj’s are generated iid from Aq -|-Exp(Ao), where Aq G {1,10,100}. The 
simulation results are reported in Figure 4 and Figure 5. We can see that the max test and 
Bonferroni’s method perform similarly, and dominate in the very sparse regime. The chi-squared 
test is somewhat better than Fisher’s method, and in some measure competitive in the moderately 
sparse regime, but essentially powerless in the very sparse regime. The higher criticism is the clear 
winner in the moderately sparse regime, as expected, and holds its own in the very sparse regime, 
although clearly inferior to the max test. Comparing the results for different Aq, we may conclude 
that, in the sparse regime, smaller counts (i.e., small Aq) make the problem more difficult — at 
least in this finite sample setting. 


Chi-squared test Max test HC 



0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 


P P P 

Figure 3: Simulation results in the sparse regime, with n = 10® and all Aj’s equal to Aq = 15. The 
blue line is the information boundary (14). The dashed blue curve for the max test is the boundary 
that it can achieve. 


5 Proofs 

For a, 6 G M, let a A 5 = min(a, 5) and a V 5 = max(a,6). For two sequences of reals (a„) and 
{hn)'- an ~ bn when anjbn —)• 1; On = o{hn) when an/hn —>■ 0; Un = 0{bn) when anjbn is bounded; 
an ^ bn when a„ = 0{bn) and bn = 0{an)] an 'C bn when an = o{bn)- Finally, an ~ bn when 
\an/bn\ V \bn/an\ = O(logn)"’ for some tc G M. We use similar notation with a superscript P when 
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^0=1 


A,o= 10 


>10= 100 



r r r 

Figure 4: Simulation results in the moderately sparse regime, with n = 10^, /3 = 0.6, and the A^’s 
generated iid from Aq + Exp(Ao). The vertical dotted line is the detection threshold. 


A,q = 1 X,Q = 10 X-o = 100 



r r r 


Figure 5: Simulation results in the very sparse regime, with n = 10^, (3 = 0.8, and the Aj’s generated 
iid from Aq + Exp(Ao). The vertical dotted line is the detection threshold. 


the sequences (a„) and {bn) are random. In particular, an = Op{bn) means that anjbn is bounded 
in probability, i.e., sup„P(|an/6n| > x) —)• 0 as a: —)• oo, and an = op{bn) means that anjbn —)• 0 in 
probability. 

When X and Y are random variables, X ^ Y means they have the same distribution. For a 
random variable X and distribution F, X ^ F means that X has distribution F. For a sequence of 
random variables (Xn) and a distribution F, Xn F means that Xn converges in distribution to F. 
Everywhere, we identify a distribution and its cumulative distribution function. For a distribution 
F, F{x) = 1 — F{x) will denote its survival function. We say that an event En hold with high 
probability (w.h.p.) if P(E„) —>■ 1 as n —>■ oo. 

We let Po)IEo,Varo (resp. Varo,i) and Pi,Ei,Vari (resp. Ei^,, Vari^*) denote the 

probability, expectation and variance under the null (resp. null at observation i) and alternative 
(resp. alternative at observation i), respectively. Recall that T;, denotes a random variable with 
the Poisson distribution with mean A, denoted P\, so that for a set A, Px{A) = P{T\ G A). 

5.1 Preliminaries 

We state here a few results that will be used later on in the proofs of the main results stated earlier 
in the paper. We start with a couple of facts about the Poisson distribution. 

The following are moderate deviation bounds for the Poisson distribution Pois(A) as A —)• oo. 
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Lemma 2. Let a : ( 0 , 00 ) —>■ (0,oo) be such that a(A) —)• 00 and a(A)/A —)• 0 as A —)• 00 . Then 

iiSo i\) S A + VA^) = -1 

and 

iis, ix) ^ ^ - v^) = 

Proof. We focus on the first statement. Let m = [A] and take li,..., Lm+i hd Poisson with mean 
1. Fixing £ G (0,1), we have 


m 

(Ta > a + a/A a(A)^ < p ^ Li + Ym+i >m + x/ma 


2 = 1 


< I + II, 


where 


I:=p(j;(y,-l)>(l-e)/ 


ma 


i=l 


II := P Yr 


.m+l 


> ex/'. 


ma 


where in the first inequality we used the fact that Ta is stochastically bounded from above by 
Y/i=/Yi, and in the second inequality we used the union bound. By (Dembo and Zeitouni, 1998, 
Th 3.7.1), 

1 - ^ (1-e)^ 


log I —^ - 


m —)> 00 . 


o(A)'''°* ' 2 

And using the fact that P(Ti > a:)/P(Ti = x) —)• 1 as x —)• 00 , we have 

logII = logP (^Ti = [£/ma{X)]^ + o(l) ~ -e/ ma{\) log /ma{X), m —>■ 00 . 

Since a(A) = o{m), we have that II = o(I), and conclude that 


limsup—logP (Ta > A +a/M^) < , 

A^oo a(A) V / 2 

and because e > 0 is arbitrary, we may take e = 0 in this last display. The reverse inequality is 
proved similarly. □ 

The following are concentration bounds for the Poisson distribution. For a real x, let [x] 
(resp. [xj) denote the smallest (resp. largest) integer greater (resp. smaller) than or equal to x. 

Lemma 3. For x > 0, define h{x) = xlog(x) — x + 1, with h{0) = 0. Then, for any X > 0, 

—A/i(|'x]/A) — ^logfx] — 1 < logP ^Ta > x^ < —Xh{x/X), Vx > A, 

and 

—Xh{[x\/X) — ^ log[xJ — 1 < logP ^Ta < x^ < —Xh{x/X), VO < x < A. 

Proof. The upper bounds result from a straightforward application of Chernoff’s bound. For the 
first lower bound, take x > A and let m = [x]. Then 


logP ^Ta > x^ > logP ^Ta = = log ^ —Xh{m/X) — logm — 1 

using the fact that m\ < 77 j"*+i/ 2 g-"A+i_ second lower bound is proved similarly. □ 
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The following is Berry-Esseen’s theorem applied to the Poisson distribution Pois(A) as A —oo. 


Lemma 4. There is a universal constant C > 0 such that 


sup 





< C/y/X. 


Proof. Let m = [A] be the smallest integer greater than or equal to A. It is enough to prove the 
result when A > 1, in which case 1/2 < A/m < 1. Take are iid Pois(A/m), so that 

~ YT=i^i- We have E(yi) = Var(yi) = A/m and EdT* - A/mp) < E(T?) < oo. The result 
now follows by the Berry-Esseen theorem. □ 


The following lemma is standard, and appears for example in (Arias-Castro and Wang, 2013). 


Lemma 5. Consider a test that rejects for large values of a statistic T„ with finite second moment, 
both under the null and alternative hypotheses. Then the test that rejects when Tn > tn ■= Eo(rn) + 
^Y/Varo(r„) is asymptotically powerful if 

^ _ Ei(Tn) — Eo(rn) 

V^Vari(r„) V Varo(r„) 

Assume in addition that Tn is asymptotically normal, both under the null and alternative hypotheses. 
Then the test is asymptotically powerless if 



Ei( T^) Eo(r 4 ^ ^ 
yVaro(r„) 

Einally, we state without proof the following simple 

Lemma 6. The function f{/3) = (1 — — {Id — 

on (3/4,1). 


Vari(rn 

Varo(r„ 


1 . 


(30) 


result. 


1/2) is nonnegative and strictly increasing 


5.2 Proof of Proposition 1 

Here we use the second moment method without truncation, which amounts to proving that 
Varo(T) —>■ 0, or equivalently, Eo(L^) < 1 + o(l), where L is the likelihood ratio 


L = X\Li, 

i=l 


where 


Li := 


(1 -e)PA,(W) + |Pa'(W) + IPa'KW) 
Px.{Xi) 


(31) 
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We have Eo(L^) = OILi ); where 


MLl) = Y. 

x=0 

oo 

= E 


[(1 - £)P\Sx) + lP\\{x) + lP\'l{xY 
PxXx) 

[(l-e)e-^-f+ |e-A^^ + |e-^-y] 
e —V 


x=0 

9 x'2 9 x"2 9 x'x" 

("I \2 I r)/-! X I ^ ~2A'+Ai+-^ £ —2A"+Ai+-^— £ —A'—A"+AiH— 

= (l-e) +2(l-e)e + —e * + —e ‘ 


= 1 + ^ - 1 ) + - 1 )] 

= 1 + On, where an := £^ [cosh(n^^) — l]. 


In the third line we used the fact that ~ A G M, and in the fourth line we 

used (10). Condition (12) and the fact that (3 < 1/2 imply that s < 0, and a Taylor expansion 
gives an < eventually. We deduce that Eo(T^) < (1 + On)"', and the RHS tends to 1 when 

nan —t 0, which is the case because of (12). 


5.3 Proof of Proposition 2 

We use the truncated second moment method of Ingster in the form put forth by Butucea et al. 
(2013). Define 


Xi = Xi + a/2(1 + rj) log(n)v^, Vi = Xi - x/2(l + r]) log(n) 

where ?? > 0 is chosen small enough that (33) and (34) hold simultaneously. 
Define the truncated likelihood function, 


P — Pi'^{Ai} 1 •— {y* E E Xi}, 

i=l 

where Li is defined in (31). As in Butucea et al. (2013), it suffices to prove that 

Eo(T) > 1 + o(l) and Eo(T^) < 1 + o(l)- 


First moment. We have 

n n 

Eo(L)=nEo(Aii{A,})= 

i=l i=l 


with 


El(Af) = (1 - £)PA.(Af) + |PA'(^f) + \Px'l{^)- 


Applying Lemma 2, using (13) and the fact that A' ~ A" ~ A, 3> logn because of (6), we get 
PA,(Af) < P,,iA/) V Py>{A'r) < n-(v^-v^)^+'’(i), 

uniformly over z = 1,..., n. Hence, 

Ei(Ai) > 1 — On, for some an < 
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which in turn implies 

Eo(L) > (l-a^r. 

Using the expression for e, we have 

nan < . 

By (15) and Lemma 6, for any (3 G (1/2,1), we have r < /?sparse(/3) < (1 — ~ < (VI + V ~ 

y/1 — /3)^, which in turn implies that 1 — /3 — {y/1 + rj — y/r)'^ < 0. Therefore, nan = (1); and so 
Eo(L) > 1 - o(l). 


Second moment. We have 


n 

Eo{L^) = llEo{Ljl^A,^), 

i=l 


where 

Eq 


nniA,i) - - 


yi<x<Xi 


A {P\'{x) + Px"{x)y 
_ Px, (x) + e(l - e) (Pa' (a^) + Px’’ (x)) + j- ^ p . / - 

yi<x<Xi 


- £) 


= E (1 

yi<x<Xi 

yi<x<Xi ® x\ 


yi<x<x 


yi<x<Xi 

^-2X[+Xi(^Y 


= i-^^ + y E + 




< 1 + — 
“ 2 L 


< 1 + ^-n-^3+2r 






In the third line we used the fact that (a + 6)^ < 2a^ + 26^ for all a, 6 G M. 

Let 5 = /0sparse(/3) — r, which is strictly positive by (15) 

Case 1. When /3 < 3/4, —2/3 + 2r = —1 — 5, and we can bound the 2nd term in (32) by n~^~^. 
Case 2. When /3 > 3/4, we distinguish two sub-cases. Let / be the function defined in 
Lemma 6. In the first case, 5 > 1/2, in which case —2/3-1-2r = —1 — 2[(5 —/(/3)] < —1 for any /3 < 1, 
so that we can bound the 2nd term in (32) by . In the second case, <5 < 1/2, so that 

f~^{6) exists in (3/4,1). If /3 < then f{/3) < 6 and the same bound on the 2nd term in 

(32) applies. If /3 > f~^{S), we have r = psparse(/3) - <5 > Pspa.rse{f~^{S)) - 6 = f~^{6) - 1/2 > 1/4. 
Fix 7/ > 0 small enough that 

r\S)-l/2>{l + rj)/A. (33) 

Since A' ~ A/ ~ A* > log n. 


Af/Ai = Aj-b 2A/2rlog(n)\/A/(l + o(l)) and A/^/A* = A* - 2A/2rlog(n)A/A/(l + o(l))- 


Hence, 


VAi([0>a:i]) = Py2/^.(^Zi < -(2Vr - + ^)\/21og(n)(l -bo 

/T+r]f+o{l) 
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and 


Px'p/Xiibh^)) = Px'p/Xi{Zi > (2\/^- \/l + r?)V21og(n)(l + o(l))) 

because of Lemma 2, and the fact that 2^/r > \/T+~^ by our choice of 77 in (33). We can thus 
bound on the 2nd term in (32) by 

^2r-2/3-(2VF-x/I+i?)2+o(l) _ 

When 7 ? = 0, the exponent is equal to 

2r - 2/3 - (2V^ - 1)2 = -1 - 2(/3 - 1 + (1 - < -1 - 2(/3 - 1 + (1 - = -1- 

Hence, when 77 > 0 is small enough, 

2r-2/3-(2V^-yrT^)2 <-1. (34) 

We conclude that Eo(L^) < 1 + o( n ^), uniformly in i, which implies that 

Eo(L 2 )<(l + o(n-i))" = l + o(l). 


5.4 Proof of Proposition 3 

The proof parallels that of Proposition 2. Here we define 


Xj = (1 + c) 


logTT 

log(Ci) ’ 


logTT. 


where c is a small positive constant that will be chosen later on, and consider the following truncated 
likelihood 

n 

L — {^i ^ Xi}. 

i=l 


First moment. Taking into account the fact that A” = 0, it suffices to prove that 


P^S^)+ePy^{A^) = o{l/n), 

uniformly over 7 = 1,..., tt,. Let h{t) = t logt — t + 1. There is to such that, for t > Iq, h{{l + c)t) > 
(1 + c/2)tlogt. Note that Xi/Xi > Ci/log(Ci) > Cmin/log(Cmin) ”^ oo, eventually, since (7) implies 
Cmin := miuj —)■ 00 . Hence, using Lemma 3, we get 


log Pa, (A"") < -Xih{xi/Xi) < 


—Ai(l + c/2) 


Ci 

log(C7) 


log 


( 

'^log(Ci) 


< -(1 + c/3) logTT, 


as soon as Cmin/ log (Cmin) IS large enough. This implies that max. Pa. (4?) = 0 ( 1 / 77 .). 

Note that (log7T.)/A( = Ci~'^■ So we also have Xi/A( > Cm7i7/^os(Cmin) —>■ 00 eventually, and 
using Lemma 3, we get 


log Pa/(4?) < -X!Mx^/X',) < 


—A'(l + c/2) 


c 


1-7 


log 


log(Ci) 4og(Ci) 


c; 


1-7 


< -(l + c/3)(l-7)log7T, 
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as soon as CmiJ/ log(Cmin) is large enough. Since 7 < /? by assumption, this implies e maxj Py {A'^) = 
o{l/n). 

Second moment. Taking into account the fact that Af = 0, it suffices to prove that 

^2 = o(l/n), 

uniformly over i = 1,... ,n. We quickly see that 

= o(l/n), 

since /3 > 1/2 is fixed. For the other term, we distinguish two cases. 

Case 1. First, assume that 7 < 1/2. Then 

^ ^ 2 gAf/Ai ^ ^-^0+(n^in = 7 ),- 2 / 3 +o( 1 ) — o{l/n). 

Case 2. Now, assume that 7 > 1/2. Then \f/{\iXi) > log Cmin —^ 00 , so that applying 

Lemma 3, we get 


Ai 


V /2 


with 


so that 


^ogPxf/x,{[^,Xi]) < -^h{xi\i/\ ) = Xilog(Ai /{\iXi)) + Xi - 

log log Cmi 


Xilog(AF/(AiXi)) < (1 + c)(logn) (27 - 1 ) + 


logCn 


(35) 


g 2 g(A' ^»)VA»p^, 2 /A,([ 0 ,Xi]) < exp 


2(3 log n - 2A' + Ai + Xi log(Af /(AjXj)) + Xi 


< ^-2/3+(1+c)(27-1)+o(1)^ 


uniformly over i = 1,... ,n, since in addition to (35), we also have —2A' + A* + Xj < x* < (1 + 
c) logn/log Cmin = o(logn). Since 'j < (3, we may choose c > 0 small enough that —2/3 + (1 + 
c)( 27 - 1 ) < -1. 


5.5 Proof of Proposition 4 

We have 

E(Ta) = A, Var(TA) = A, E(Ta - A)^ = A, E(Ta - A)^ = 3A2 + A. 
Using this, for the Poisson model (1), we have 

Eo(P>) = n, Ei(D) = n + Varo(Il) = 2n + ^ —, 

i=l * i=i 

and, after some simple but tedious calculations, 

Vari(Zl) = Varo(iA) + eR, 


where 




4A^ . 7Af , (l-e)A? 

A. aW A^ 


< C^(aj + a^) , 
i=l 
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for some universal constant C > 0, using ( 8 ). We have Ei(L)) — Kq{D) = e ^11=1 Varo(-D) V 

Vari(L)) < + + C'e X]r=i(®*+ )• Because of ( 8 ), we have h ~ 0{n) and then, by 

(18), we have eX^ILi With this and the second part of (18), it becomes straightforward 

to see that the hrst part of Lemma 5 applies and we conclude that way. 

We now prove that the chi-squared test is asymptotically powerless under (19). For one thing, 
this condition implies that Vari(Il) ~ Varo(Zl), based on (19) and the bound on R above, and also 
that Ei(Zl) —Eo(Il) <C y^Vari(Il) V Varo(Il). It therefore suffices to prove that D is asymptotically 
normal both under the null and under the alternative. We have D = Zf, where Zf := (W — 
XiY/Xi, and these being independent random variables, it suffices to verify Lyapunov’s conditions. 
Some straightforward calculations yield 

EoiZf - Eo{Zf)Y = EoiZf - 1)4 < c(l + 1 + ^ + ^), 

for some constant C > 0 , and using ( 8 ), we get 

n 

Varo(D)“ 2 ^Eo(Zf - 1 )^ = 0{l/r?)n = 0{l/n) = o(l). 
i=l 

With some more work, and using ( 8 ), we also obtain 

Ei(Z2-Ei(Z2))4<C(l + e(a, + af)), 
for some constant C > 0 , so that 

n n 

Vari(Il )-2 Ei(z 2 - Ei(z2))4 = 0{l/r?) (l + e{a. + a^)) = o(l), 
i=l i=l 

which is an immediate consequence of (19). 


5.6 Proof of Proposition 5 

When r > (1 — y/1 — /3)^, there exists a 5 > 0 such that r > (\/l + <5 — Vl “ /^)^- Define the 
threshold Cn = y^2(1 -|- 5) log(n). Under the null, by the union bound and Lemma 2, under (6), 

n 

Eo(M > Cn) < Y.Eo{\Zi\ > Cn) = = o(l). 

i=l 


Under the alternative, define I' 
Lemma 2, we have 



:= {i : 

. min Pi,n 

2 = 1,...,72 ’ 


~ Pois(A')} and 


We then derive the following 


E(T;,/ > Xi + Cn^fXi)- By 


Ei(M > Cn) > 


> 

> 


E (max Zi > Cn) 

^ iei’ ’ 

i-E[n(i-p'_j 

is/' 

i-ii-Kr'/'-oii), 
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where in the last line we used the fact that |I'| ~ Bin(n, e/2), so that \P\ > ne/4 with probability 
tending to one. Since 

{ne)p'^ > (X), n —oo, 

because r > (\/l + 6 — \/l — by construction, we have Pi(M > Cn) —)• 1 as n —)■ oo, as we 
needed to prove. 

5.7 Proof of Proposition 6 

We first control the size of the statistic T* under the null. For each z G M, the variables 
1 {|2.|>2 },z = 1,... ,n, are independent Bernoulli, with respective parameters K\^{z),i = 1,... ,n. 
We can therefore apply Bernstein’s inequality, to get 

logPo (Ei(l{|Zi |>4 -^Ai(^)) > ta{z)^ < Vt > 0, 

where := '^^Kx-{z){l — K\.{z)). Choosing t = 2-y/logn and letting z G so that > \t, the 
right-hand side is bounded by — | logn. Thus, applying the union bound, we get 

Po (r* > 2^/1^) < \Zn\n-^l\ 

where \Zn\ is the cardinality of Z^. We now show that \Zn\ is subpolynomial in re. By Lemma 3, 
we have 

Kx{z) < ^ 

where h is defined in that lemma, and extended as h{t) = oo when t < 0, so that this inequality 
is true for all A,z > 0. Note that h{l + t) = t‘^/2 + 0{t^) when t = o(l). Take Zn = log re. 
Because of (6), uniformly in z = 1,...,re, we have Kx^{zn) < and in particular, < 

< log re eventually. Hence, by monotonicity, z < Zn for all 2 ; G Zn- In particular, 
\Zn\ < Zn- Hence, we arrive at the conclusion that Pq (T* > 2 -v/logre) = o(l). 

Suppose we are now under the alternative. We focus on the case where r < 1, which is more 
subtle. Consider Zn{q) = [\/2qlogrej, defined for any g > 0. By Lemma 2, when (6) and (13) hold, 
we have Kx^{zn{q)) = uniformly over i. Hence, 

pIM) ■= ^oi\Zi\ > Zniq)) = Kxiizniq)) = re"''+°(^\ 

uniformly over i. In particular, when q G (0,1) is fixed, > log re, eventually, in 

which case Zn{q) G Zn- Hence, for each fixed q G (0,1), we have T* > T(zn{q)) for re large enough, 
and so it suffices to prove that, for some well-chosen q, Fi{T{zn{q)) < 2^1ogre) = o(l). 

Assume q > r. By Lemma 2 again, this time under the alternative, and also assuming that (6) 
and (13) hold, then 

Kx'S^niq)) = 

Kx''{zn{q)) = 

uniformly over f = 1,..., re. Hence, 

pi,M ■= > ^niq)) = (1 - e)Kxiizn{q)) + ^-Kx'.{zn{.q)) + ^Kx'^{zn{q)) 
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It follows that 


E^{T{zn{q))) 


E^iPlniQ) -pUq)) 

^Jj2iPlniQ)i'^ - Pi,ni9)) 

^l/2+g/2-/3-(yg-VF)2+o(l) 


^/n^-q+oil) 


and 


Ya.vi{T{zn{q))) 


U=iPUQ)i^-plniQ)) 

E^Plni<l)i^-plnm 


0(1) V , 


First, assume that r < 1/4, so that r — (l3 — 1/2) = r —/3sparse(/5) > 0, where the equality follows 
from (14) and the fact that r < 1/4. We take g = 4r and get 


Ei(r(z„(4r))) = 


with r — /3 + 1/2 = r — (/? — 1/2) > 0, and 

Vari(r(z„(4r))) = 0(1) V 
By Chebyshev’s inequality, we have 

0(1) V n-^+3’-+°(b 

^l+ 2 r- 2 ^+o(l) 

f 0(n“^“^’’'''^^"''°(^^), if /? > 3r, 
\0(n^+^-i+°(i)), iip<3r, 

with —1 — 2r + 2/3 < —1 — 2{/3 — 1/2) + 2/3 = 0 and /3 + r — l<r + l/2 + r — 1<0 since r < 1/4. 

Now, assume that r > 1/4, which together with r > /0sparse(/3) and r > 1/4 implies that 
r > (1 — y/1 — /3)^, which in turn forces 1 — /3 — (1 — ^/r)‘^ > 0. Take r < q < 1 such that 
1 — /3 — (y/q — y/rY > 0 Then 

Ei{T{zn{q))) = ni-^-(V4-v^)"+°(i) 


Pi(r(2;n(4r) < 2v^logn) < 


Vari(r(z„(4r)) 


(Ei(r(zn(4r))-2VI^)2 


and 

Vari(r(zn(g))) = 
Thus, by Chebyshev’s inequality. 


Ei(r( 2 :n(g')) < 2 v^logn) < 


Vari(^(^n(g)) ^ ^(yg_^) 2 _i+^+o(i) 

(Ei(r(2;„(g))) - 2\/logn)2 


0 ( 1 ). 


5.8 Proof of Proposition 7 

Consider the situation under the null. Because of Lemma 1, we have 

min Pi minttj, ui,..., ~ Unif(0,1). 

i i 

Therefore, under the null we have Eo(minjpj < oJnln) = o(l) for any sequence ujn = o(l). Take 
OJn = 1 /logn. 
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Under the alternative, let /' = {z : Xj ~ Pois(A')}. Note that Aj/i(Xj/Aj) > log(n/w„) implies 

Pi = P(Ta, > Xi\Xi) < Un/n, 

where the equality is due to the fact that, necessarily, Xi > 3Aj eventually, and the inequality 
comes from Lemma 3. Thus, defining qi = ¥ [\ih{Txi/\i) > \og{n/oon)), we arrive at 

Pi(minpj > Wn/n) < 

i 

< 

< 

where g'min := niini=i,..,,n Qi, and in the last line we used the fact that \I'\ ~ Bin(n,e/2), so that 
11'I > ne/4 with probability tending to one. Note that 

gi=T(Tv>6*), bi := 

where for f > 0, h~^{t) is defined as the unique x > 1 such that h{x) = t. Notice that h~^{t) ~ 
t/ logt when t —)• oo. Let Q = logn/Aj, so that Cmin := min* Q ^ oo when (7) holds. We have 

bi/Xi ~ log n/(A- log Ci) = log Ci > Cm7n / log Cmin OO. 

Therefore, applying the first lower bound in Lemma 3, we get 

logg* > -A'/i([5i]/A') - llog\bi] - 1 ~ - 6 * log(5i/A') ~ log(Cj^"'^) = -(1 - 7 )logn, 

uniformly over z = 1,... ,n because minj( 6 j A [bi/X^) A Q) —)> oo. In particular, 

implying that negmin > oo, because 7 > /3 by assumption. We conclude that 

Pi(minjpj > UnIn) = o(l), as we needed to prove. 


E 


min Pi > U)n/n 
i&r 

Oier ( 1 - “ 


(1 - qmlnr/\ 


6 The one-sided setting 

Up until now, we considered a two-sided setting, partly motivated by the important example of 
goodness-of-ht testing, where Pearson’s chi-squared test is omnipresent. Simpler is a one-sided 
setting, where instead of (1) we have 

W ~ (1 — e) Pois(Ai) -|- £ Pois(A'), (36) 

together with A( = Ai -|-Aj and e G [0,1], and address the problem (3) in this context. Such a model 
may be relevant in some image processing applications where the goal is to detect an anomaly in 
the form pixels with higher-intensity. 

6.1 Dense Regime 

In the dense regime where (9) holds with /? < 1/2, we consider the same parameterization (10). 
Define 

PlZse{P) = P-l- (37) 
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Proposition 8. Consider the testing problem (3) in the one-sided setting (36), with parameteri- 
zations (9) with f3 < 1/2 and (10). All tests are asymptotically powerless if 


, one 
^ rdens 


Xf3)- 


(38) 


The proof is parallel to that of Proposition 1 — in fact simpler — and is omitted. We note 
that this detection boundary is in direct correspondence with that in the normal model (Cal et ah, 
2011 ). 

In the one-sided setting, the chi-squared test does not achieve the detection boundary. However, 
its one-sided version does. Indeed, consider the test that rejects for large values of 


W — \i 


(39) 


Proposition 9. Consider the testing problem (3) in the one-sided setting (36), with (8), and let 
ai = Aj/t/^. The test based on (39) is asymptotically powerful if (18) holds. In particular, with 
parameterization (9) with /3 < 1/2 and (10), the test is asymptotically powerful when s > 

The proof is parallel to, and in fact much simpler than, that of Proposition 4, and is omitted. 
All the arguments are simpler in the one-sided setting, so much so that we are able to analysis 
Fisher’s method. In the one-sided setting, instead of (24), define the P-values as in (26). Note that 
Lemma 1 still applies. 

Proposition 10. Consider the testing problem (3) in the one-sided setting (36), with (8), and let 
Oi = Aj/t/A- Fisher’s test (based on (25)/ is asymptotically powerful if 

e^(oi A 1) 3> -v/n. 


In particular, with parameterization (9) with (3 < 1/2 and (10), Fisher’s test is asymptotically 
powerful when s > 

To streamline the proof, which is somewhat long and technical, we implicitly focused on the 
most interesting case where the afs are bounded, but this is not intrinsic to the method. In fact, 
the test has increasing power with respect to each Oj. The technical proof is detailed in Section 6.3. 


6.2 Sparse Regime 


In the sparse regime, the same results apply. In particular, the detection boundary described in 
Propositions 2 and 3 applies. The max test — now based on maxj Zj — and Bonferroni’s method 
achieve the detection boundary in the very sparse regime (/? > 3/4). The higher criticism is now 
based on 


T* = sup T{x), T{x) 


Ei ['^{Xi>x} - G\^{x)) 
VEiG'A,(®)(l - G'a,(x))’ 


with definition (26) and 


An = {x G N : EiG'A,(a^)(l - Ga,(x)) > logn}, 

and it achieves the detection boundary over the whole sparse regime (/3 > 1/2). The technical 
arguments are parallel, and in fact simpler, and are omitted. 
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6.3 Proof of Proposition 10 

Let V be the statistic (25). We seek to apply Lemma 5, which is based on the first two moments, 
under the null and under the alternative. In what follows, A > 1 and A' = A + a\/A with 0 < a < 1 
for some constant C > 0. 

Difference in means. For A > 0, g^ix) = IP(T;, = x), Gx{x) = P(Ta > x), and Fx{X) = 
—2logGx{X). We have 

IEa(Pa) = -‘^'^[^ogGxix)]gx{x) = 2^[logGA(x - 1) - logGA(x)]GA(x), 

fc>0 fc>l 

using the fact that gxix) = Ga(x) — Ga(x + 1) and Ga( 0) = 1. A similar expression holds for 
IEa'(-^a), and combined, we get 


Ex>{Fx) - Ea(Fa) = 2 J][logGA(x - 1 ) - logGA(x)][GAKx) - Ga(x)] 

gx{x - !)■ 


X>1 


2 ^ log 


1 + 


X>1 


G'a(x) 


[Gv(x)-Ga(x)]. 


In that case, the summands are positive, since logGA(x — 1) > logGA(x) by monotonicity of Gx, 
and Ga'(x) > Ga(x) by the fact that Ta' stochastically dominates Ta when A' > A. To get a lower 
bound, we may thus restrict the sum to any subset of x’s, and we choose x € Ix ■= [A, A + \/A]. 
Since A > 1, /a 7 ^ 0. Moreover, 


1 


< Gx{x) < Go, 


Vx G I, 


for some universal constant Cq > 1. This is a direct consequence of Lemma 4 when A > Aq for 
some large-enough constant Aq, and otherwise, it comes from the fact that Gxix) > 0 for all pairs 
(A, x) such that A < Aq and x G Ix, which is a finite set of pairs. We also have 


1 


< gxix) < 


G^ 

Va’ 


Vx G [A — 1, A + a/A] . 


for a numeric constant Gi > 1. Indeed, by Stirling’s formula, we have gxix) x~^l‘^ exp(—A/i(x/A)), 
where we recall that /i(x) = xlogx — x + 1, and we have x“^/^ x and also A/i(x/A) x 1 , 

uniformly over x € Ix- We also have 


9uix) 

gxix) 


> I/G2, 


Vx G /, Vz^ G [A, A'], 


for a numeric constant 6*2 > 1. Indeed, 


>exp[-i/ + A + Alog(z//A)] = exp [ - 
> exp [ — a/A)] , 


which is bounded from below when a is bounded from above. Using the fact that dxGxix) = 
gxix — 1), by the mean-value theorem, we also have Gyix) — Gxix) = (A' — X)gx,,,ix), for some 
Aa; G [A, A'], which together with the last two bounds implies that 


Gx'ix) - Gxix) > a/Gs, Vx G h, 
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for a numeric constant Ca > 1. Gathering all these results, we derive 


IEa'(-^a) -IEa(Fa) >2 ^ log 
xsi^nz 


1 + 


a 

— > 


CoCiVxi C3- Ci' 


for another constant (74 > 1, because |/a 71 Z| x y/X. 

Variances. When X ~ gx, Gx{X) stochastically dominates U ~ Unif(0, 1), and because 
t —>• (logt)^ is decreasing on (0,1), we have 

Ea(F|) <C 5 :=4E[(log[/)2] < 00 . 


Let Rx,x'{X) = gy{X)/gx{X). We have 

IEa'(-^a) = IEa[-Fa ^a,a'] < 2Ea(-F|) +Ea[F| i?A,A'l{R;, ;^,>2}]- 
Note that Rx^y^x) > 2 if, and only if, x > x* := (A + log2)/log(l + A/A). Hence, 

EA[F|i2A,A'l{ij;,^,>2}] = [^ogGx{x)]‘^g\'{x). 

X>Xt 


Lemma 7 (Bohman’s inequality, as in Sec 35.1.8 of DasGupta (2008)). For any A > 0, 

P(Ta > x) > ^( 2 ^), VxGN. 

This lemma, together with Mills ratio, yields 

^ [logGA(x)]2ffA'(a;) = 0(1) ®“^'^^exp[-A/i(x/A)], 

X>X:t. X>X* ^ 


since, for any x > x*, > t* := x 1/a > 1. We learn in (Shorack and Wellner, 1986, Prop 

1, p. 441) that h{l + t) > ^^^(1 + |i)~^ for all t > 0. Hence, 

1 ( _ \\2 o 

AMx/A) > ^ > ^^1{.<4A} + 4(X - A)1{.>4A}. 

3 A 

Thus 


E 

X>Xf 


X — A 


X exp[— A/i(x/A)] < 


X — A\4 


x*<x<4A 


+ E 

X>A\ 


a/A 

X — A\4 


X exp 


x-'/2gxp 


{x - A ) 
4A 




2n 


The first sum is bounded by 

[SvfA] [A+(i+l)\/AJ 

(i + l)^e"*'/^< X;(i + l)V''0 = o(l). 

t=t* x=[A+t'\/AJ t>t* 


The second sum is bounded by 

A-^/2 Y,ix- A)^e-t(^-^) = A-'^/^ ^4g-|x < 

x>4iX x>3X 
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for a numeric constant Cg, since A > 1. We conclude that 

< C7, 


for some numeric constant Cj. 

Conclusion. Since the test has increasing power with respect to each ai, we may assume that 
tti < 1 for all i. Let = —2logGx.{Xi) and notice that V = Fx. is our test statistic. We have 

Ei(y) - Eo(y) = ^ [Ei(FaJ - Eo(FaJ] = e [Ea^(FaJ - Ea,(FaJ] > 

i i i 


and 

Varo(C)<EEA.(Ffj<nC5, 

i 

as well as 

Vari(l/) < 

i i 

By Lemma 5, we conclude that the test is asymptotically powerful when 

e E ^ 


7 Discussion 

We drew a strong parallel between the Poisson means model and the normal means model. The 
correspondence is in fact exact when all the Aj’s are at least logarithmic in n. When the A* are 
smaller, we uncovered a new detection boundary in the sparse regime. We studied the chi-squared 
test, the max test and the higher criticism, which are shown here to have similar properties as in the 
normal model. Motivated by the higher criticism, we also advocated a multiple testing approach 
to Poisson means model, and studied emblematic approaches such as Fisher’s and Bonferroni’s 
methods, which are indeed shown to achieve the detection boundary in some regime/model. An 
open direction might be to adapt the method of Meinshausen and Rice (2006) for estimating the 
number of non null effects in the Poisson means model. 
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